Part II - California Housing Price¶

by Ojikutu Aisha¶

Investigation Overview¶

This presentation aims to demonstrate how housing costs vary depending on location and distance from the coast. The main features are housing_median_age (Years), median_income (USD), median_house_value (USD) and ocean_proximity.

Dataset Overview¶

The data analyzed is the California housing price dataset downloaded from kaggle. The dataset contains 20,640 observations and 10 features. The features are listed below:

  1. longitude
  2. latitude
  3. housing_median_age (Years)
  4. total_rooms
  5. total_bedrooms
  6. population
  7. households
  8. median_income (USD)
  9. median_house_value (USD)
  10. ocean_proximity
In [2]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import plotly.express as px

%matplotlib inline

# suppress warnings from final output
import warnings
warnings.simplefilter("ignore")
In [3]:
# load in the dataset into a pandas dataframe
df = pd.read_csv('housing.csv')
df.sample(5)
Out[3]:
longitude latitude housing_median_age total_rooms total_bedrooms population households median_income median_house_value ocean_proximity
5319 -118.42 34.06 52.0 1881.0 334.0 640.0 321.0 6.8710 500001.0 <1H OCEAN
18277 -122.07 37.35 35.0 1447.0 205.0 619.0 206.0 9.8144 500001.0 <1H OCEAN
2811 -119.03 35.42 42.0 1705.0 418.0 905.0 393.0 1.6286 54600.0 INLAND
5523 -118.36 33.98 40.0 1113.0 234.0 584.0 231.0 3.0927 316000.0 <1H OCEAN
6211 -117.89 34.07 35.0 834.0 137.0 392.0 123.0 4.5179 218800.0 <1H OCEAN
In [4]:
# Drop null values
df.dropna(axis=0, inplace=True)

# Change the datatype of some features from `float` to `int`
obs = ['housing_median_age', 'total_rooms', 'total_bedrooms', 'population', 'households']

for v in obs:
    df[v] = df[v].astype('int')

Income and House Value¶

There is a positive correlation between households income and house value as shown in the plot below:

In [5]:
# Scatter plot of house value and income
sb.scatterplot(data=df, x='median_income', y='median_house_value')
plt.xlabel('Income [Thousand USD]')
plt.ylabel('House Value [USD]')
plt.title('Income vs House Value');

Housing Age and House Value¶

The age of the House does not have any impact on the value placed on the house.

In [6]:
# Scatter plot of house age and house value
sb.scatterplot(data=df, x='housing_median_age', y='median_house_value')
plt.xlabel('Housing Age [Years]')
plt.ylabel('House Value [USD]')
plt.title('Housing Age vs House Value');

Housing Location, Price and House Value¶

The location of the houses have impact on the value of the house. The closer they are to the Waters, the higher the value

In [7]:
fig = px.scatter_mapbox(df,
                        lat='latitude',
                        lon='longitude',
                        center={'lat':37.09, 'lon':-121},
                        height=600,
                        width=600,
                        color='median_house_value',
                        hover_data=['ocean_proximity'])
fig.update_layout(mapbox_style='open-street-map', title='Housing Price and Location')
fig.show()

Generate Slideshow: Once you're ready to generate your slideshow, use the jupyter nbconvert command to generate the HTML slide show. . From the terminal or command line, use the following expression.

In [8]:
!jupyter nbconvert Part_II_slide_deck_template.ipynb --to slides --post serve --no-input --no-prompt
[NbConvertApp] Converting notebook Part_II_slide_deck_template.ipynb to slides
[NbConvertApp] Writing 1497128 bytes to Part_II_slide_deck_template.slides.html
[NbConvertApp] Redirecting reveal.js requests to https://cdnjs.cloudflare.com/ajax/libs/reveal.js/3.5.0
Traceback (most recent call last):
  File "C:\Users\TIMOTHY\anaconda3\Scripts\jupyter-nbconvert-script.py", line 10, in <module>
    sys.exit(main())
  File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\jupyter_core\application.py", line 264, in launch_instance
    return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
  File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\traitlets\config\application.py", line 846, in launch_instance
    app.start()
  File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\nbconvert\nbconvertapp.py", line 369, in start
    self.convert_notebooks()
  File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\nbconvert\nbconvertapp.py", line 541, in convert_notebooks
    self.convert_single_notebook(notebook_filename)
  File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\nbconvert\nbconvertapp.py", line 508, in convert_single_notebook
    self.postprocess_single_notebook(write_results)
  File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\nbconvert\nbconvertapp.py", line 480, in postprocess_single_notebook
    self.postprocessor(write_results)
  File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\nbconvert\postprocessors\base.py", line 28, in __call__
    self.postprocess(input)
  File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\nbconvert\postprocessors\serve.py", line 90, in postprocess
    http_server.listen(self.port, address=self.ip)
  File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\tornado\tcpserver.py", line 151, in listen
    sockets = bind_sockets(port, address=address)
  File "C:\Users\TIMOTHY\anaconda3\lib\site-packages\tornado\netutil.py", line 161, in bind_sockets
    sock.bind(sockaddr)
OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted

This should open a tab in your web browser where you can scroll through your presentation. Sub-slides can be accessed by pressing 'down' when viewing its parent slide. Make sure you remove all of the quote-formatted guide notes like this one before you finish your presentation! At last, you can stop the Kernel.